Skip to content

Add batched simplices_containing query and Rust default_loss for LearnerND hot loops#13

Merged
basnijholt merged 2 commits into
mainfrom
batched-queries-and-default-loss
Jun 10, 2026
Merged

Add batched simplices_containing query and Rust default_loss for LearnerND hot loops#13
basnijholt merged 2 commits into
mainfrom
batched-queries-and-default-loss

Conversation

@basnijholt

@basnijholt basnijholt commented Jun 10, 2026

Copy link
Copy Markdown
Member

Summary

Profiling LearnerND with the Rust backend active (shipped in adaptive 1.5.0, adaptive#493) shows that only ~14% (2D) / ~25% (3D) of the remaining runtime is spent inside this extension — roughly half is LearnerND's own Python. Two of the remaining hotspots are triangulation-shaped rather than learner-shaped, so they belong here:

component (3D sphere_of_fire, 1500 pts) share of runtime
adaptive's own Python (learnerND.py) 48%
Rust extension calls 25%
sortedcontainers (simplex queue) 10%
numpy/scipy + other 17%
  1. tell_pending's neighbour loop: for every pending point, adaptive loops over all simplices sharing a vertex with the containing simplex and calls tri.point_in_simplex one at a time (~14 checks per point in 2D, ~70 in 3D, ~96% misses) — a barycentric solve plus FFI round-trip per check.
  2. default_loss: a Python wrapper that tuple-splices vertices and values before calling simplex_volume_in_embedding; the wrapper costs 3-7x the Rust math it wraps.

New APIs

Triangulation.simplices_containing(point, simplex=None, candidates=None, eps=1e-8)

All simplices containing point, as a sorted list of tuples. Instead of one solve per neighbour, it solves once for a simplex known to contain the point (the simplex hint when given and valid — mapping directly onto tell_pending's simplex argument — else locate_point), reduces it to the face the point actually lies on, and returns the simplices containing that face via the existing facet index. A stale/wrong hint safely falls back to locating the point. Passing candidates instead filters those through point_in_simplex, preserving the original per-candidate semantics. Error parity with point_in_simplex (ZeroDivisionError / numpy.linalg.LinAlgError).

default_loss(simplex, values, value_scale=None)

The LearnerND default loss (embedded simplex volume) taking the scaled vertex/value arrays directly, with bulk-copy fast paths for 1-D/2-D f64 numpy arrays (exactly what _compute_loss passes). Signature-compatible with adaptive's loss_per_simplex functions, so the adaptive side can swap it in with one import; value_scale is accepted and unused, like the reference.

Performance

End-to-end LearnerND on adaptive 1.5.0 with both wired in the way the adaptive-side integration would (examples/learnernd_batched_apis.py):

2D ring_of_fire, 3000 points:
  baseline (Rust backend, adaptive >= 1.5)     0.46s  (1.00x)
  + rust default_loss                          0.41s  (1.12x)
  + simplices_containing tell_pending          0.44s  (1.04x)
  + both                                       0.40s  (1.17x, identical points)

3D sphere_of_fire, 1500 points:
  baseline (Rust backend, adaptive >= 1.5)     0.74s  (1.00x)
  + rust default_loss                          0.65s  (1.15x)
  + simplices_containing tell_pending          0.62s  (1.19x)
  + both                                       0.53s  (1.40x, identical points)

Every configuration samples identical points to the baseline. The win grows with dimension (vertex stars get bigger), which is exactly where LearnerND is used.

Deliberately not moved here: the simplex priority queue (generic data structure, fixable in adaptive with a lazy-deletion heap) and the pending-point/subtriangulation bookkeeping (that would absorb LearnerND's state machine and break user-pluggable loss functions).

Also includes a README benchmark refresh against adaptive 1.5.0: the usage section now leads with the automatic backend selection (pip install "adaptive[rust]"), and the end-to-end table was re-measured — LearnerND's pure-Python path got faster in 1.5.0, so the honest headline ratio is now 3.3× (was 3.7×).

Testing

  • 16 new tests in tests/test_batched_queries.py: brute-force parity in 2D/3D/4D (including vertex probes), equivalence with the exact tell_pending neighbour loop it replaces, hint/stale-hint/empty-hint behaviour, candidates filtering, eps forwarding, and default_loss parity against adaptive.learner.learnerND.default_loss for scalar/vector values, numpy arrays, and plain lists.
  • Full suite: 131 passed against the locked adaptive (CI uses uv sync --locked, adaptive 1.3.2). Against adaptive 1.5.0, everything passes except the pre-existing test_learnernd_with_neighbor_aware_loss_runs failure: 1.5.0 auto-selects the Rust backend, which surfaces the known collinear simplex_volume_in_embedding divergence (raises where the reference returns 0.0) — unrelated to this change, needs its own fix before bumping the lock.
  • cargo clippy -D warnings, cargo fmt, ruff (0.11.0) all clean.

Follow-up: a small adaptive-side PR can adopt both — tell_pending collapses to one simplices_containing call, and triangulation_backend re-exports default_loss.

Profiling LearnerND with the Rust backend (adaptive PR #493) showed only
14-25% of runtime left inside this extension; two of the remaining Python
hotspots are triangulation-shaped and move here:

- Triangulation.simplices_containing(point, simplex=None, candidates=None,
  eps=1e-8): all simplices containing a point in one call. Instead of one
  barycentric solve per neighbouring simplex (the loop LearnerND.tell_pending
  runs in Python today, ~14 checks per point in 2D, ~70 in 3D), it solves
  once for a simplex known to contain the point (the hint, or locate_point),
  reduces it to the face the point lies on, and looks up the simplices
  containing that face in the facet index.
- default_loss(simplex, values, value_scale=None): the LearnerND default
  loss (embedded simplex volume) taking the scaled vertex/value arrays
  directly, signature-compatible with loss_per_simplex.

End-to-end with both wired into LearnerND (examples/learnernd_batched_apis.py):
1.18x in 2D and 1.39x in 3D on top of the Rust backend, with identical
sampled points.
adaptive 1.5.0 ships the automatic Rust-backend selection, so the usage
section now leads with `pip install "adaptive[rust]"` (monkey-patching is
only needed for < 1.5.0) and the example points at the release instead of
the PR branch. Re-measured all tables against 1.5.0: standalone numbers are
unchanged within noise; LearnerND's pure-Python path got faster in 1.5.0,
so the honest end-to-end ratio is now 3.3x (was 3.7x). Adds the batched-API
numbers (1.17x 2D / 1.40x 3D on top of the backend) to the performance
section.
@basnijholt basnijholt merged commit d65ac17 into main Jun 10, 2026
17 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant